Large‐Scale Proteome Profiling Identifies Biomarkers Associated with Suspected Neurosyphilis Diagnosis

Abstract Neurosyphilis (NS) is a central nervous system (CNS) infection caused by Treponema pallidum (T. pallidum). NS can occur at any stage of syphilis and manifests as a broad spectrum of clinical symptoms. Often referred to as “the great imitator,” NS can be easily overlooked or misdiagnosed due to the absence of standard diagnostic tests, potentially leading to severe and irreversible organ dysfunction. In this study, proteomic and machine learning model techniques are used to characterize 223 cerebrospinal fluid (CSF) samples to identify diagnostic markers of NS and provide insights into the underlying mechanisms of the associated inflammatory responses. Three biomarkers (SEMA7A, SERPINA3, and ITIH4) are validated as contributors to NS diagnosis through multicenter verification of an additional 115 CSF samples. We anticipate that the identified biomarkers will become effective tools for assisting in diagnosis of NS. Our insights into NS pathogenesis in brain tissue may inform therapeutic strategies and drug discoveries for NS patients.


Protein extraction and tryptic digestion
Protease inhibitor was added to the CSF samples and a concentrated sample.Then, the supernatants were transferred to new centrifuge tubes.The top 14 high-abundance proteins were removed using High-Select Top 14 Abundant Protein Depletion Resin (Thermo Fisher Scientific).Then, the supernatants were collected and the protein concentrations were determined using a Bradford kit according to the manufacturer's protocols.Next, the samples were transferred into 10-KDa ultrafiltration filters and replaced liquid with UA buffer (8 M urea, 150 mM Tris-HCl, pH 8.0).Then, 25 mM DTT was added to the samples, which were incubated for 1 h at 37°C.After centrifugation at 14,000 × g for 15 min at 25°C, 50 mM iodoacetamide was added to the samples, followed by incubation in the dark for 30 min at 25°C.Next, 25 mM NH4HCO3 was added to the samples, which were centrifuged for 10 min and repeated four times.Then, final digestion was performed at 37°C overnight by incubating the samples with trypsin (enzyme-to-substrate ratio of 1:50).The samples were washed three times using 100 μL of 25 mM NH4HCO3 and centrifuged at 12,000 × g for 10 min.Finally, the supernatants containing peptide mixtures were transferred to clean tubes for LC-MS/MS.Samples from organoids were scraped into new EP tube with 20 μL of urea buffer (8 M urea, 150 mM Tris-HCl, 10 mM DTT, pH 8.0).An additional 10 μL buffer was added to the tube.Steel balls were added to the 30-μL buffer for vibration (70 Hz) for 1 min.After centrifugation at 14,000 × g for 10 min at 4°C, the supernatants were transferred to clean tubes.Next, the extracted proteins were reduced at 37°C for 1 h and alkylated in 25 mM iodoacetamide at room temperature for 30 min in the dark.Finally, the protein samples were digested with Lys C (1 μg at 37°C for 4 h) and trypsin (enzyme-to-substrate ratio of 1:50) at 37°C for 16 h, desalted using C18 cartridges and vacuum-dried using a Speed Vac.

High pH reversed-phase chromatography
The digests were further fractionated using high pH reversed-phase chromatography for data-dependent acquisition (DDA) samples.One hundred micrograms of the digest were combined.A reverse chromatography column was used for separating the mixed peptides and performed by a RIGOL L-3000 system (RIGOL, Beijing, China).
The peptide mixtures were dissolved in 100 μL mobile phase A (2% [v/v] acetonitrile, 98% [v/v] ddH2O, pH 10) and then centrifuged at 14,000 × g for 20 min.The supernatants were loaded into XBridge peptide BEH C18 columns (130Å, 3.5 μm, 4.6 mm × 150 mm; Waters Corp) and eluted stepwise by injecting the mobile B (98% [v/v] acetonitrile, 2% [v/v] ddH2O, pH 10).The flow rate was set at 1 mL/min.The fractions were eluted (1 min each) and collected using step gradients of mobile phase B. Forty fractions were collected along with the LC separation, which was subsequently pooled (in a nonsequential fashion) into 10 fractions.The final 10 fractions were freeze-dried and stored at −80°C.
For DDA-MS runs, the entire MS scanning range was from 300 to 1500 m/z.The resolution for MS was set to 60,000 and then under 2.5-s top speed mode for 15,000-resolution MS/MS scans.For high-energy collision dissociation, the isolation window was set to 1.6 m/z, and a normalized collision energy of 32% was applied.
For DIA-MS runs, the entire MS scan was from 300 to 1500 m/z.Then, DIA segments were acquired at the resolution of 30,000, and the collision energy was 33%.
The spectra were recorded in profile mode.The default charge state for the MS2 was set to 3.

Proteomic MS/MS data processing
The Proteome Discoverer software (version 2.3.0.523;Thermo Fisher Scientific) were used to analyze the DDA data and search against the UniProt human database (downloaded on 2019-7-31, containing 73,940 proteins).The parameters used for database searches were as set follows: precursor and fragment mass tolerances of 10 ppm and 0.02 Da, respectively; trypsin as the digestion enzyme; a maximum number of missed cleavage sites of 2; oxidation (M) and acetylation (protein N-terminus) set as dynamic modifications; and carbamidomethylation of cysteine set as a fixed modification.The identified proteins were filtered at both the peptide and protein levels at a 1% false discovery rate, determined by a target-decoy search strategy.All DDA results were loaded into Spectronaut (v.14.10.201222.47784;Biognosys, Switzerland) to generate the sample-specific spectral library.Then, the raw DIA data were processed on Spectronaut using the default settings.Briefly, the retention time prediction type was set to dynamic iRT and correction factor for the window.Mass calibration was set to local mass calibration.Decoy generation was set to scramble (no decoy limit).Removing fragments for quantification based on interfering signals, the interference correction on the MS2 level was enabled, maintaining at least three fragments for quantification.The false discovery rate was estimated with the mProphet approach and set to 1% at the peptide level.Protein inference was performed on the principle of parsimony using the ID Picker algorithm implemented in Spectronaut.The RAW files were converted into the Spectronaut file format to analyze the DIA runs with the spectral library.The files were then calibrated using the global spectral library in the retention time dimension.Subsequently, the recalibrated files were used for targeted data analysis with the spectral library without new recalibration of the retention time dimension.
A spectral library from PRM-MS analysis was also constructed from the DDA data, and unique peptides of the target proteins were selected and exported to set the PRM-MS method.First, the raw MS files from the PRM-MS data were processed in Skyline (v.20.1.0.155).Next, the library's top five product ions of target proteins were used for comparison and quantification.The data were deemed reliable when the peak shape was intact and the retention time was within the set retention time range, and the undetected product ions were manually removed.Then, the peptide peak areas observed in samples from patients were exported into Excel for further analysis.

Figure S2 .
Figure S2.Analysis of differentially expressed proteins of CSF samples between NS

Figure S4 .
Figure S4.Construction of scoring table for potential biomarkers based on proteomic

Figure S5 .
Figure S5.Identification of potential biomarkers that can distinguish NS from PTNS.

Figure S6 .
Figure S6.Verification of potential biomarkers in the PTNS and SNS.A) ELISA